This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 




33 1 B14S61BS36 



£PID 



W_ RoovoV^-t^f 



An das Europaische Patontamt 



Eintritt in die regionaie 
Phase vor dam EPA 
als Bestimmungsamt 
.oder ausgewahftem Amt 



To the European Patent Office 



Entry into the regional 
phase before the EPO 
as designated or elected 
Office 



, % 



r 



? n y-m 1999 . 



aset 



regionaie devarrtTOEB 
agissant en qualite d'Office 
designe ou elu 



Europaische Artmeldenurnrner Oder, falls 
nicht bekannt. PCT-Aktenzeichen oder PCT- 
Veroffentlichungsnummer 



European apofication number, of. if not 
known. PCT application or publication 
number 

PCT/IB98/01031 



Numero do depot da ta demande da brevet 
europaen ou. a defaut nurnero da depot 
PCT ou de publication PCT 



Zeichen des Anmelders Oder Vertreters 
(max. 15 Positionen) 



Applicant's or representative's reference 
(max. 1 5 spaces) 



Reference du damandeur ou du mandataire 
0 5 caracteres ou espacea au maximum) 



PHF 90.560 EP <J S ^ 2 fi ^ & ^ , <=f 



) 



□ 



Anmelder 

Die Angaben uber den (die) AnmeWer 
sind in der intemationalen Veroffentfi- 
chung emhatten oder vom Intematio- 
nalen Buro nach der intemationalen 
Veroffentlichung verrnerfct werden. 

Anderungen, die das Internationale 
Buro noch nicht verrnerfct hat. sind auf 
ainem Zusatrblatr angegeben. 

Zustetlanschrfft 

(siehe Merkbian If, 1) 



Applicant 

Indications concerning the applicant(s) 
are contained in the international publi- 
cation or recorded by the international 
Bureau after the international publi- 
cation. 

Changes which have not yet been 
recorded by the International Bureau 
are sat out on an additional sheet. m 

Address for correspondence 

(see Notes ?i, V 



1- 



Dsmandeur 

Les indications concernant lets) de- 
mandeur^) figurant dans la publication 
Internationale ou ont ete enragistrdes 
par le Bureau international apres la 
publication internationals. 

Las changements qui n*om pas encore 
ete enregistrds par le Bureau international 
sont inclques sur una feuitte additionelle. 

Adresse pour ta correspondence 

(voir nooca //. V 



Vertreter 



Name (Nur einen Vartreter angeben. 
der in das europaische Pate ntregis tor 
eingetragen und an den zugesteilt 
wird) 



Geschaftsanschrift 
Telefon 

Telefax Telex 

[ | Weiteretr) Vertreter auf Zusaizblatt 

3. Vollmacht 

| ) einzoh/ollmacht ist beigefOgt. 

[~n Allgemeine Vollmacht ist ragistriert 

— unter Nummec 



Z. Representative 

Name (Name only one representative 
who is to be listed in the Register of 
European Patents and to whom 
notification is to be made) 

LlANDOUS Y / Christian 

Address of place of business 

SociStg Civile "SPID" 
156 Boulevard Hausaraann 
75008 Paris, France 

Telephone 

+33 1 40 768043 

Fax Telex 

+33 1 45 610536 



Mandataire 



Norn (N'indiquer qu* un seul 
mandataire. qui sera inscrit au Registre 
europaen des brevets at auquel 
signification sera faiiel 



' Adreese professionnelle 



Additional representative^) on 
additio nal sheet 

3. Authorisation 

Individual authorisation is attached. 

General authorisation has been 
registered under No: 



Telephone 

Telefax Telox 

Autrebl mandatairets) sur feuille 
addition nelle 

Pouvoir 

Un pouvoir special est joint. 

Un pouvoir general a atd enregistre 
sous le n" : 



Allgemeine Vollmacht ist eingereicht. 
aber noch nicht ragistriert. 

Die beim EPA als PCT-Anmeldeamt 
eingereicrtte Vollmacht schilefit 
ausdrflckHch die regionaie Phase ein. 



A general authorisation has been filed, 
but not yet registered. 

The authorisation filed with the EPO as 
PCT receiving Office expressly 
includes the regional phase. 



Un pouvoir gdn6ral a ate depose mais 
n est pas encore enregistre. 

La pouvoir general tel qua depose a 
KOEB agissant en qualite d'office 
recepteur au titre du PCT s' applique 
expreasament a la phas« regionaie. . 



1 



PCT 



RECU LE 



REQUES 



j 8 S JUIL 19^8 ! 

Intcmitional Application. No. 



S P 1 D 



The undersigned requests that the present 

international application be processed 
according to the Patent Cooperation Treaty. 



For rccsivine Office use onlv 



International F.line Date 



Name of receiving Office and "PCT International Application" 



Applicant's or agent's file reference 
(if desired) (12 characters maximum) 



PHF 98.560WO 



' Box No I TITLE OF invention "Variable bitrate video coding method and corresponding video 
coder" 


Box No. n APPLICANT 


Name and address: ( Famih name followed by given name: for a legal entity, full official designation. 
The address must include postal code and name at country. The country aj the address indicated in this 
Box is the applicant's State (Le. country) of residence if no State oj resiaence is indicated below.) 

Koninklijke Philips Electronics N.V. 
Groenewoudseweg 1 
5621 B A Eindhoven 
The Netherlands 


[ ] This person is also inventor. 


Telephone No. 

+31402743444 


Facsimile No. 

+31402743489 


Teleprinter No. 


State tie. country) of nationality: 

NL 


State tie. country) or residence: 

NL 


This*Derscn is applicant fT2 all designated 1 j ail dcsieruied Stales exeem j 1 the United Slates 1 | the States indicated in 

for the mirpoS'K of* >QCStates | I the United States of America ( | of Amend only | | the Supplemental Box 


Box No. m FURTHER APPLICANT® AND/OR (FURTHER) LNVENTOR(S) 


Name and address: (Famih name followed by given name: for a legal entity, full officii designation. 
The address must include postal code and name of country. The country of the aadress tnaicatea in tnts 
Box is the applicant's State tie. country) of resiaence if no State oj resiaence is wdicasea below.) 

Philips Norden AB 
Kottbygatan 7, Kista 
16485 Stockholm 
Sweden 


This person is: 

fx! applicant only for Qnly 
| ] applicant and inventor 

[ | inventor only (Jf this check-box 
is marked do not fill in below.) 


State (i.e. country) of nationality: 

SE 


State (i.e. country) of residence: 

SE 


This person is applicant | 1 all designated | 1 all designated States except j | the United States 1 I the States indicated in 

for the purposes of' 1_ 1 States 1 1 the United States of America 1 1 of America only 1 1 the Supplemental 3ox 


Further applicants and/or (further) inventors are indicated on a continuation sheet; 


Box No. IV AGENT OR COMMON REPRESENTATIVE; OR ADDRESS FOR CORRESPONDENCE 


The person identified below is hereby/has been appointed to act on oehalt ^ asent P"| common representative 
of the applicani(s) before the competent International Authorities as: t 1 - 1 \ 


Name and address: (Family name followed by given name: for a legal entity, full o$icial designation. 
The address must include postal code and name of country. ) 

LANDOUSY, Christian 
Internationaal Octrooibureau B.V. 
P.O. Box 220 
5600 AE Eindhoven 
The Netherlands 


Telephone No. 

31402743444 


Facsimile No. 

31402743489 


Teleprinter No. 


( 1 Marie this check-box where no agent or common representative is/has been appointed and the space above is used instead to 

' I indicate a soecial address to which corresoondence should be sent. 



« VARIABLE BITRATE VIDEO CODING METHOD AND CORRESPONDING VIDEO CODER » 



The present invention relates to a variable bitrate video coding method 
including an iterative process that comprises a first analysis step, for coding a bitstream 
corresponding to a picture sequence with a constant quantization stepsize, and a second 
prediction step, for predicting the quantization stepsize which must then be used to code 
said bitstream according to a predetermined target bit rate, and is followed by a final 
control step, for adjusting the stepsize with respect to said target bit rate. The invention 
also relates to a corresponding video coder allowing to carry out said method. 

As described in the document « MPEG Video coding : a basic tutorial 
introduction », S.R. Ely, BBC Research and Development Report, BBC-RD-1996/3, pp.l- 
10, MPEG activities started in 1988 with the aim of defining standards for digital 
compression of video and audio signals. The first goal was to define a video algorithm for 
digital storage media such as the CD-ROM (Compact Disc Read-Only Memory), but the 
resulting standard was also applied in the Interactive CD system (CD-I). Allowing 
transmission and storage of picture data at bit rates in the range of 1 to 15 Mbits/s, this 
standard is based on a data compression achieved by using a block-based motion 
compensation for the reduction of the temporal redundancy and a discrete cosine 
transformation (OCT) for the reduction of the spatial redundancy. 

With conventional CD standards such as CD-I and CD-ROM, the transmission 
bit rate is fixed and pictures can therefore only be coded at a constant bit rate. New 
standards such as the Digital Versatile Disc (DVD) allow for transmission of data at a 
variable bit rate (VBR) : complex scenes can be coded at a higher bitrate than scenes 
containing less information, in order to maintain a constant quality. 

An object of the invention is to propose a VBR video coding method allowing 
to obtain such a constant quality of the coded sequence, with a minimal bit rate of the 
encoder output bitstream. 

To this end the invention relates to a VBR video coding method such as 
defined in the preamble of the description and which is moreover characterized in that it 
comprises, between the analysis and prediction steps of one iteration, a picture re- 
arrangement step. 

Such a picture re-arrangement step preferably comprises in series a first 
scene change detection sub-step, a second allocation sub-step, and a third optimum 
placement sub-step. 



In a preferred implementation, said scene change detection sub-step 
comprises a correlation operation, carried out between suceeding pictures of the 
sequence, and a decision operation, for indicating the possible occurrence of a scene 
change. 

Another object of the invention is to propose a VBR video coder allowing to 
carry out said coding method. 

To this end the invention relates to a variable bit rate video coder comprising 
a first coding branch, a second prediction branch, and a control circuit provided for 
carrying out the implementation of the following operations : 

a coding operation of a bitstream corresponding to a picture sequence 
with a constant quantization stepsize ; 

a prediction operation, for an estimation of the quantization stepsize 
allowing to code said bitstream according to a specified target bitrate ; 
at least one repetition of said operations ; 

a final controlling operation, for adjusting the stepsize with respect to 
said target bitrate ; 

characterized in that said control circuit is provided for implementing, between the first 
coding operation and the first prediction operation, a picture re-arrangement operation. 

The advantages of the invention will be now explained in a more detailed 
manner in relation with the following description and the accompanying drawings, in 
which : 

- Fig.l shows some pictures of a group of pictures, and Figs 2 and 3 
illustrate for such a typical group of pictures the difference between the display order of 
the pictures and their transmission order ; 

- Figs 4 and 5 show respectively the main steps of the coding method 
according to the invention and a corresponding architecture of video coder allowing to 
implement said method ; 

- Fig. 6 shows some pictures and the associated motion vectors for a 
sequence with a scene change between two successive B pictures ; 

- Figs. 7 and 8 show values of two detection coefficients DCL and DCR 
computed in order to evaluate quantitatively the motion vector statistics used by the 
macroblocks of the B picture for the three possible positions of a scene cut with respect 
to the two B pictures of an IBBP or a PBBP group of pictures ; 

- Fig. 9 shows values of a similar detection coefficient DCP for P pictures ; 

- Fig. 10 illustrates the performance of the scene charge detection method 
using only B pictures ; 



- Fig. 11 gives the decision values of the Viterbi algorithm used to solve the 
problem of optimal allocation of groups of pictures (or GOPs) ; 

- Fig. 12 illustrates an example of weighting function for the GOP size (as 
weighting values are used to evaluate the size of the GOPs) ; 

- Fig. 13 shows a control loop used for the implementation of the final control 
step of said coding method. 

Before describing the coding method according to the invention, some basic 
principles of the MPEG-2 video standard may be recalled. The flexibility of this standard, 
intended to support a wide range of possible applications, is obtained thanks to the 
definition of profiles and levels allowing to suit all of the application requirements. A 
profile is a subset of the MPEG-2 standard intended to support only the features needed 
by a given class of application, while a level defines a set of imposed constraints on 
parameters of the bitstream. 

The basic steps of the MPEG-2 compression, applied to colour pictures 
consisting of three components (Y,U,V), concern pictures that are divided into small 
subsections, or macroblocks, themselves consisting of luminance and chrominance blocks. 
These steps are : motion estimation and compensation (based on 16 pixels x 16 lines 
macroblocks), discrete cosine transformation (based on 8 pixels x 8 lines blocks), run- 
length coding. 

Three types of pictures are defined. Intra pictures (or I pictures) are coded 
without reference to other pictures, predictive pictures (or P pictures) are coded using a 
motion-compensated prediction from a past I or P picture, and bidirectional-predictive 
pictures (or B pictures) use both past and future I or P pictures for motion compensation. 
The motion information is given in the form of motion vectors obtained by 
implementation of a block-matching search (in which a large number of trial offsets are 
tested in the coder and the best one is selected on the basis of a measurement of the 
minimum error between the block being coded and the prediction). 

As indicated in Fig.l that illustrates how, on the basis of the motion vectors, 
the P and B pictures are defined, the different pictures typically occur in a repeating 
sequence which is termed, as said above, a group of pictures, or GOP, and consists of an 
I picture and ail succeeding pictures until the next I picture occurs. A typical GOP is 
shown in display order in Fig. 2 (the black arrows correspond to forward predictions and 
the white ones to backward predictions, the sequences and the predictions repeating 
periodically) and in transmission order in Fig.3 (P4, P7, 110, P13 designate the re-ordered 
frames), said orders being different to enable at the decoding side backward predictions 
from future pictures. 



A regular GOP structure can be described with two parameters, N and M. 
The parameter N, defined as the size of a GOP, is, as shown in Fig.2, the number of 
pictures of said GOP, i.e. the number of pictures between two I pictures plus one. The 
parameter M is the spacing of P pictures, or (which is the same) the number of adjacent 
B pictures plus one. In the illustrated example of Figs. 1 to 3, M = 3 and N = 9. 
Obviously other combinations are possible : 

Picture display order N M 

IPPPPPIPP 6 1 

IBPBPBPBI 8 2 

IBBPBBPBBPBBIBBP 12 3 

N and M being chosen independently from each other. 

After a decision about the kind of macroblock compensation has been taken, 
the prediction error for each pixel of the concerned macroblock is obtained by subtracting 
the estimated macroblock from the original one. A DCT is then performed on the 
prediction error, for a block of 8 x 8 pixels (hence six DCT transforms are determined for 
each macroblock : four for the luminance component, two for the chrominance 
components), and the frequency components thus obtained are quantized. The 
quantization stepsize determines the bitrate and the distorsion of the decoded image : if 
the quantization is coarse, few bits are needed to code a picture, but the final quality is 
low, while, if the quantization stepsize is fine, many bits are needed to code the picture, 
but the quality is high. As the human eye is less sensible to the higher frequencies than 
to the lower ones, it is advantageous to use coarser quantizers for the high frequency 
components (in fact, in order to achieve the frequency dependent quantization, a 
weighting matrix is applied to a basic macroblock quantization parameter : a lot of 
coefficients, especially those at high frequencies, is equal to 0 after said weighted 
quantization). 

Each block is then zigzag scanned and the obtained list is coded. The run- 
length coding is done by determining a pair (A, NZ) where A designates the number of 
consecutive zeros (0 to 63) and NZ the amplitude of the following non-zero coefficient. 
A variable length code is then assigned to this pair (A,NZ), depending on the frequency of 
occurrence of this pair (a combination (A,NZ) which is common is assigned a short 
variable length code, whereas a pair which is less frequent is assigned a long one). 

The functions hereabove described are summarized for example in the 
document « Hybrid extended MPEG video coding algorithm for general video 
applications », C.T. Chen and aL, Signal Processing : Image Communication 5 (1993), 
pp. 21-37, part 2.4, which includes the scheme of a generalized MPEG-2 coder. The aim of 
the VBR coding method now proposed is then to use the information that is gained from 
preceding coding steps carried out in several successive analysis passes to perform an 



adaptive allocation of the picture types, which allows to minimize the size of the final 
bitstream in order to fit exactly on a storage medium with a fixed capacity (like a DVD). A 
diagram of said method is given in Fig.4, and an architecture of video coder with picture 
rearrangement allowing to implement said method is illustrated in Fig. 5. 

The coding method is divided into four steps 41, 42, 43 and 44. The first 
step 41 is an analysis one, in which a picture sequence is coded with a constant 
quantization stepsize Q c (and therefore with a constant quality). At the end of this step, a 
regular MPEG-2 compliant bit stream has been generated, but the average bit rate of the 
whole sequence thus processed (i.e. the quotient of the total number of coded bits over 
the sequence by the total number of pictures in that sequence), unknown before the end 
of said step, does not fulfil the required constraint of a specific size of the bitstream. 

The second step 42 is a picture re-ordering one, performed after the analysis 
step 41. This re-ordering step may be itself subdivided into three sub-steps 110 to 130. 
The task of optimizing the picture allocation can be considered as comprising two 
separate parts. The first one is the improvement of the placement of the I pictures, which 
is equivalent to an optimization of the GOP allocation, while the second one is the most 
efficient placement of B and P pictures. 

The first and second sub-steps 110 and 120 constitute the first part of said 
optimizing task. It is clear that I pictures, that do not exploit the temporal correlation 
between successive pictures of a sequence, are the most costly ones in terms of bit rate. 
On the other hand, they are necessary to allow random access to a sequence, and 
random access is important for many applications. Moreover, since quick random access 
is often wanted, it must be taken care of that a given maximum distance between I 
pictures is not exceeded (for instance, at most 12 pictures). When a scene change 
occurs, the pictures before and after the cut (left and right pictures) are uncorrected. 
The motion compensation is not well performed for the first P picture in the new scene, 
and its bit. rate is therefore approximately that of an I picture. Such an I picture can then 
be placed instead of said P picture without an extra cost in bandwidth. The strategy for 
an optimal placement of I pictures must then allocate the I pictures at the beginning of a 
new scene whenever possible. 

The first sub-step 110, which is a scene change detection sub-step, allows 
for such an allocation. In order to detect scene changes, the correlation of succeeding 
pictures of the sequence is examined (preferably after the motion compensation) : if two 
adjacent pictures are almost uncorrelated, it is likely that a new scene begins with the 
second one. In a basic MPEG-2 coding process, several parameters give some information 
about the correlation between successive pictures : 

- the complexity of a P or B picture : the connection between complexity and 
correlation is however not always verified (a low bit rate being sometimes due to a high 



correlation with the reference picture or to a low intra complexity, with a totally black 
pictures, for instance) ; 

- a better estimation of the correlation (although more expensive in 
computation time) is obtained by comparing the macroblocks MB of a pictures with their 
reference macroblocks (always provided by the motion compensation unit, whatever the 
type of block coding : intra or inter) : the comparison can be done by means of a 
computation of the squared-error distorsion d(MB), which is for example given by the 
relation (1) : 

d(MB) = — g(P(i)-M(i)) 2 (1) 

where P(i) is a pixel of the analyzed macroblock MB and M(i) is a pixel of the reference 
macroblock ; 

- the motion estimation being macroblock-oriented, a picture is predicted 
using several motion compensation options : 

Picture type Motion compensation option 



I 


Intra (i.e. no motion compensation) 


p 


Intra 


p 


Forward 


p 


Not compensated 


B 


Intra 


B 


Forward 


B 


Backward 


B 


Interpolated 



and the motion compensation statistics may convey information about picture 
correlation : if most macroblocks are intra coded, the correlation with the reference 
picture(s) is low and vice-versa. 

The implemented embodiment uses said motion compensation statistics for 
the detection of scene changes, in the case where only B pictures are used for instance, 
as seen in Fig. 6 which shows pictures and motion vectors for a sequence PBBP with a 
scene change illustrated between the two B pictures (broken arrows indicate that less 
macroblocks of the concerned reference picture are used to predict the dependent 
picture, the correlation being lower). As the scene cut occurs between the two B pictures, 
the first one uses almost only the preceding P picture as reference picture, since it is 
basically uncorrelated with the following P picture. Similarly, the second B picture is 
almost uncorrelated with the preceding P picture and uses almost only the following P 
picture as reference. 

A scene change can be placed before, between, or after two adjacent B 
pictures, in a group of three pictures such as illustrated (PBB, or IBB). The table given 
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hereunder shows, for the three possible positions of the scene cut, the motion 
compensation that is used by most macroblocks of the first B picture and by most 
macroblocks of the second picture : 

POSITION FIRST PICTURE SECOND PICTURE 

5 before backward backward 

between forward backward 

after forward forward 

(the direction of the motion compensation being "seen" from the point of view of the B 
pictures). 

10 To evaluate quantitatively the motion vector statistics mentioned hereabove, 

two detection coefficients DCL and DCR are computed (DC for "detection coefficient", L 
and R for "left" and "right", MC for "motion compensated") : 

DCL ^ (intra) + (backward MC) 

(forward) + (interpolated MC) 

DCR = (intra ) + (forward MC) 

(backward) + (interpolated MC) 

15 For P pictures, the detection coefficient can be similarly defined : 

DCP = (4) 

(forward MC) + (not MC) 

As shown in Figs.7 and 8 which indicate values of the left and right detection 
coefficients DCL and DCR for successive B pictures, scene cuts clearly correspond to 
spikes. Similarly, the detection coefficients DCP for P pictures are shown in Fig. 9 

20 (obviously, the information conveyed by the motion vectors of P pictures is much less 

reliable than that provided by the B pictures). 

In case (for example) of a scene change that occurs after the two B pictures, 
i.e. between the second B picture and the following reference picture (on the right side of 
said second B picture), few macroblocks of the analyzed B picture are backward 

25 compensated or interpolated, since the correlation between said B picture and the 

following reference picture is low, and a majority of them is intra or forward motion 
compensated : the value of the detection coefficient DCR is therefore high, whereas the 
value of the detection coefficient DCL is not increased (on the contrary, in case of a scene 
change on the left side of the first B picture, between the previous reference picture and 

30 said B picture, DCL has a high value and DCR remains small, while both DCL and DCR 

have a small value if no scene change occurs in the block of M pictures). In fact, in order 
to have a single, symmetric indicator of scene changes, the difference DDV between both 
detection values is computed, which yields : 
DDV = DCL - DCR (5) 
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that is to say : 

pDV (intra) + (backward MC) (intra) + (forward MC) 

(forward) + (interpolated MC) (backward) + (interpolated MC) 

DDV ^ (NBMB per picture) * (backward MC - forward MC) 
(forward + interpolated) * (backward + interpolated) 

This difference DDV, called motion compensation ratio, is computed for each 
5 B picture of each group of three pictures IBB or PBB. As it is assumed that there is no 

more than one scene change for each group, a decision value DVL measuring the 
probability of such a scene change is determined by adding the absolute values of DDV 
for the two adjacent B pictures : 

D ^_]DDVm[ 1 ^pV(2)l 

2 

10 the numbers 1 and 2 indicating whether the ratio is related to the first or to the second 

of the two succeeding pictures. The exact position of the scene change with respect to 
the bidirectionnel pictures can then be determined by looking at the signs of the two 
ratios : 

- if DDV(l) and DDV(2) > 0, the scene change has occurred before the first 

15 B picture ; 

- if DDV(l) > 0 and DDV(2) < 0, the scene change has occurred between 
the two B pictures ; 

- if DDV(l) and DDV(2) < 0, the scene change has occurred after the two B 

pictures. 

20 The performance of the scene change detection method using only B 

pictures is shown in Fig. 10. One decision value for each IBB or PBB group is computed, 
and it may be observed that : 

- the spikes of the decision values are at the same position as the real scene 

cuts ; 

25 - the noise around the macroblock n°50 is caused by light effects in the 

sequence, which disturb the motion estimation algorithm and therefore the motion 
compensation dependent scene cut prediction ; 

- the last part of the examined video sequence is basically a standing image 
(the pictures are almost identical) : it is therefore undefined which motion compensation 

30 is used, since the reference macroblock is the same for all compensation types, and the 

decision values have consequently a non-negligible value although no scene change 
occurs (to reduce this risk of wrong scene change predictions, it can then be useful to 
consider the motion vectors statistics of P pictures in addition to that of B pictures : if the 
detection coefficient for a P picture is low, no scene change has occurred for the three 
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preceding pictures). 

The second sub-step 120 is a GOP allocation sub-step. An optimal allocation 
of a GOP is determined by two conflicting aims : 

(a) the first one is to select a preferred size for the GOP : if a GOP is too 

5 small, bits are wasted because more costly I pictures are allocated than necessary, while 

random access is impaired if a GOP is too big ; 

(b) the second one is to match the start picture of a GOP with the position of 
a scene change. 

O Hence the problem of GOP .allocation is to arrange the GOPs in an optimal 
10 way while meeting the constraints (a) and (b) (i.e. to start a new GOP at the beginning of 

a new scene / a maximum and a minimum size of said GOP being respected). In order to 
solve this optimization problem, a Viterbi algorithm is used : for each path the diversion 
from the preferred size of the GOP is penalized whereas the inclusion of a probable scene 
change at the start of a GOP is rewarded, the cumulative sum of all decision values 
15 determining the path which is chosen for each picture. 

This algorithm finds the optimal start positions of the GOPs over the 
sequence. Every picture has an attached scene change decision value which describes the 
probability of a scene cut at the respective position : if the decision value is big, it means 
that there is a high probability for a scene change at that position, and it is therefore 
20 profitable to allocate a new GOP. However, as the size of the GOPs have to be neither too 

small or too big, the transitions between the GOP start points (i.e. the size of the GOPs) 
are also weighted. 

In Fig. 11, the vertical lines represent pictures, the parameters Sj describe the 
scene change probabilities of the respective picture (it is assumed that only one scene 
25 change occurs for a group of three pictures, hence only one decision value Dj (N) will exist 

for each PBB or IBB block), and the W parameters are the weighting values which 
evaluate the size of the GOP (sizes close to an optimum size, such as 12, being 
preferred). The decision value of a path which ends at picture is then computed as 
follows : 

30 Dj(N) = Q- N + W(N) (9) 

with N being the size of the considered GOP and Q- N being the sum of all weights S and 
W for the optimum allocation of GOPs from picture "1" up to picture X 'i-N". The GOP size 
is chosen which has the highest decision value. The weighting function W = f(N) is a 
quadratic one, and W therefore decreases proportionally with the squared difference 

35 between the GOP size and the preferred GOP size, such as indicated in Fig. 12 illustrating 

an example of weighting function for the GOP size (the proposed weight function causes 
all GOPs between two scene changes to have approximately the same size : hence, if two 
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scene changes have a distance of 16 pictures, two GOPs with a size of 8 are allocated 
rather than one GOP with the size of 10 and another with the size of 6). 

Up to now, it has however not be considered that there is a difference 
between the transmission order and the display order of pictures. If one considers that 
the start of GOP is allocated at the first picture after a scene change, then the first group 
of M pictures in the GOP starts at the scene cut, too. However, the I picture is the last 
picture of the block to be displayed. If M= 3, the first two pictures of the GOP are coded 
as B pictures and only the third one is an I picture. Therefore the start of the GOPs can 
be shifted by one or two positions to the left in order to guarantee that the first picture 
after a scene change is really an I picture and not a 8 one. 

The third sub-step 130 is a P and B picture allocation sub-step. In view of an 
optimization of the time-dependent parameter M, to search in an adaptive manner for the 
best place of B and P pictures indeed allows a minimization of the bitrate needed for the 
coding of the sequence. Increasing the value of M increases the bitrate of P pictures, but 
more bitrate efficient B pictures are used instead of P pictures. The correlation between 
succeeding pictures is therefore the most important parameter for the optimization, which 
will be in fact subdivided into two sub-tasks : 

(a) a long-term optimization, in order to find the optimum M over several 

GOPs ; 

(b) a short-term optimization, in order to find the best place of B and P 
pictures inside a GOP while taking into account the local variations of the correlation 
between pictures. 

With respect to the long-term optimization, it must be noted that, if the 
correlation coefficient between successive pictures tends toward one, it does not matter 
whether a B picture or a P picture is chosen since almost no coefficient bits remain in any 
case, while motion compensation does not work if.said correlation is very low. In these 
extreme cases (respectively a standing image and uncorrelated pictures), it is not obvious 
which M is to be preferred. In the other cases, it is generally possible to say that a small 
M performs well for a low correlated sequence and that a big M is better for a sequence 
with high correlation. The best results for the long-term optimization of M are obtained if 
experiments are performed over a large number of scenes. 

With respect to the short-term optimization, it may be added that M can 
arbitrarily vary inside each GOP, which makes it possible to use short-term variations of 
the correlation between pictures in order to minimize the bitrate. An example for short- 
term optimization of M is given in the following table, indicating the choice of M before a 
scene change : 
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SCENE 1 SCENE 2 



POSITION 


1 


2 3 


4 


M = 1 


P 


P P 


I 


M = 2 


P 


P B (like P) 


I 


M = 3 


P 


B (like P) B 


I 



Obviously, the B pictures before the scene change can only be forward predicted. It does 
not make a big difference whether M = 1 or 2 before the new scene, because the B 
picture before the scene change behaves tike a P picture ; a choice of M = 3 is clearly 
worse because the B picture at position 3 uses a reference picture (the preceding P 
picture) at position 1, hence at a distance of two positions. Since the correlation between 
pictures decreases as their distance from each other increases, the bitrate of the B 
picture at position 3 is higher for M = 3 than the bitrates of the pictures at the same 
position for M = 1 or 2. 

The third step 43 is a prediction one, intended to predict the quantization 
stepsize Q which must be used to code the bitstream according to the specific target 
bitrate. Once said prediction step is completed, the analysis step 41 may be repeated 
(arrow in Fig.4) as often as necessary in order to get a more precise estimation for Q 
(however, a good prediction is generally obtained after a few runs, for instance two). 

As the quantization stepsize Q available at the end of this second step is only 
an estimated value, the total bit budget is not exactly matched if every picture is coded 
by using said predicted value. A final step 44 is provided that allows to guarantee that the 
constraint on the totaf average bit rate is strictly observed. To ensure that the final output 
bitstream has indeed exactly the desired size, a quantization stepsize control process is 
implemented. This process is based oq a control loop relying on a comparison of 
predicted and real bit rates. After the coding of each picture in the final step, the control 
process compares the total number of bits that have been spent with the allowed one. If 
more bits have been spent than the budget allows, the quantization stepsize is increased, 
and the bit rate of the following pictures is reduced. If fewer bits have been spent than 
the budget allows, Q is decreased and the bit rate is increased, the total target bit rate 
being finally exactly matched. 

Said VBR coding method may be implemented in a coder having an 
architecture such as shown in Fig. 5, where each block corresponds to a particular 
function that is performed under the supervision of a controller 55. The illustrated coder 
comprises in series an input buffer 51, a subtracter 549, a OCT circuit 521, a quantization 
circuit 522, a variable length coding circuit 523, and an output buffer 524. The circuits 
521 to 524 constitute the main elements of a coding branch 52, to which a prediction 
branch 53, including an inverse quantization circuit 531, an inverse DCT circuit 532 and a 
prediction sub-system, is associated. This prediction sub-system itself comprises an adder 
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541, a buffer 542, a motion estimation circuit 543 (said estimation is based on an analysis 
of the input signals available at the output of the buffer 51), a motion compensation 
circuit 544 (the output signals of which are sent backwards to the second input of the 
adder 541), and the subtracter 549 (receiving the output signals of the buffer 51 and the 
output signals of the motion compensation circuit 544, for sending the difference of said 
signals towards the coding branch). 

The output of the illustrated coder is sent towards the controller 55 that 
includes the control loop provided to carry out the final step 44. The main elements of 
said control loop for the final pass of the VBR coder are shown in Fig. 13. As already 
explained, it is necessary to adjust the quantization stepsize during this final coding pass, 
in order to ensure that the total target bit rate given by the operator is exactly matched. 
Said loop first comprises a first computation circuit 131 in which the output of the loop 
(i.e. the cumulative prediction error) is multiplied by a factor KP. This factor is itself equal 
to a constant QQ (chosen by the operator) multiplied by a weighting factor Q in t/APG, 
where Qnt is an integrative estimation of Q and APG the total number of bits for a GOP 
(of N pictures). 

An adder'133 then adds the output Q.prop of said circuit 131 and the signal 
Qint, available at the output of a second computation circuit 132 provided for yielding an 
integrative estimation of Q. A conversion circuit 134 gives the cumulative bitrate for all 
preceding pictures, on the basis of a relation R = fj(Q) (between the quantization factor Q 
at the output of the adder 63 and the bitrate R) stored in said circuit 134. The cumulative 
bitrate thus obtained is compared in a comparator 135 with the cumulative predicted 
bitrate available on a second input of said comparator and is used, after an integration in 
a circuit 136, in order to modify Q accordingly. 

The VBR coding strategy as presented above is an improvement with respect 
to previous VBR coders because it achieves a better equalization of the perceptive quality 
of the decoded sequence. The classical VBR coders adjust the quantization parameter Q 
while coding a picture, so that the predicted bit rate is matched for every picture. Hence 
they allow the quantization parameter Q to vary inside a picture, and no constant spatial 
quality of the picture can be achieved. This variation in quality occurs whether the bit rate 
of the picture is correctly predicted or not. For the proposed VBR coder, Q is kept 
constant over a picture and the spatial quality of any picture in the video sequence does 
not vary. If the picture bit rates and the quantization stepwidth are correctly estimated, 
the Q before adaptive quantization, and therewith the subjective distortion, remains 
exactly constant for all macroblocks of the sequence. Since the quantization stepwidth 
and the picture bit rates are only estimated, a variation of Q, and hence of the quality of 
the sequence, occurs from picture to picture, but, after some analysis passes, the 
deviations of Q, averaged over a picture, are generally below 1 %. 



13 



Apart from the attainment of a constant intra-picture quality, several other 
important aspects of the new VBR strategy may be mentioned : 

- it is possible to improve the prediction of the quantization factor in an iterative 
way by increasing the number of analysis passes : if, after the analysis run f the deviation 
from the wanted target bit rate is still too high, a better estimation for the quantization 
factor can be calculated using the results from the previous coding passes ; 

- as the new VBR coding strategy predicts Q, analysis passes that are 
performed with another picture order than the predicted pass can be exploited : this is 
impossible using old strategies, and this is a major advantage of the new coding 
concept ; 

- if, in the final pass, the variations of Q and consequently of the quality turn 
out to be unacceptabiy high, the final step can be used as an analysis pass for the 
prediction of Q and of the bitrate for the subsequent pass : using this feature, it is 
possible to develop a coder that performs as many coding passes as needed until the 
characteristics of the output bitstream are within certain limits defined by the operator ; 

- as the control loop has an integrative character, short-term bit rate 
prediction errors cancel each other out : therefore, systematic, picture-type dependent 
prediction errors do not seriously affect the performance of the proposed VBR coder. 

The invention is obviously not limited to the embodiment described 
hereinbefore, from which variations or improvements may be conceived without 
departing from the scope of said invention. For instance, an optional fourth sub-step, 
referenced 140 in Fig.4 and shown with connections in dotted lines, may be included into 
the re-ordering step 42, as now explained. 

In order to code a sequence exactly at a given bitrate R(t) in the final pass 
of the last step 44, it is indeed necessary to predict a target quantization stepwidth Q and 
the target picture bitrates R(i). For the execution of an analysis pass, the only 
requirement is to have a prediction of Q. As no control system for Q is used during the 
first analysis step 41, no prediction of the picture targets is necessary. For the estimation 
of the bitrates R(i) and the stepwidth Q, the quantization factor and the picture bitrates 
of the previous coding pass are needed. However, if the order of the picture types is 
changed between the two passes, the same picture of a sequence may be coded by two 
different picture types in said two passes. 

If one considers for example that the analysis pass was performed using N = 
12 and M = 3, whereas the predicted pass is coded with N = 8 and M = 2, the 
corresponding picture allocations are shown in the following table : 

N M Picture display order 

12 3 BBIBBPBBPBBPBBIB 
8.....2 B IBPBPBPBIBPBBBP 
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where the second picture is coded as a B picture in the first pass and as an I picture in 
the second pass. As the bitrate prediction is provided for predicting the target bitrate of a 
picture which has the same type as the picture in the first analysis pass, if a picture was 
coded as a B picture in said first pass, the bitrate of a B picture is hence predicted for the 
second pass. In case of a modified picture order, the predicted picture bitrates are 
therefore useless. 

As the prediction of the target picture bitrates is not possible after a picture 
re-arrangement, the final coding pass cannot be performed directly after the picture re- 
ordering. A second analysis pass must therefore be carried out before said final coding 
pass : hence at least three coding passes are needed in that case for the VBR coder 
according to the invention. In order to guarantee that the predicted picture bitrates for 
the final pass are not too wrong, an w inter-picture" prediction additional sub-step may 
therefore be provided, which estimates the bitrates that the pictures would have if the 
analysis pass had been performed with the new picture order instead of the old one. This 
additional sub^step 140, which is, as already said, optional, exploits the temporal 
correlation of picture bitrates. 
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CLAIMS : 

1. A variable bitrate video coding method including an iterative process that 
comprises a first analysis step, for coding a bitstream corresponding to a picture 
sequence with a constant quantization stepsize, and a second prediction step, for 
predicting the quantization stepsize which must then be used to code said bitstream 
according to a predetermined target bit rate, and is followed by a final control step, for 
adjusting the stepsize with respect to said target bit rate, said method being 
characterized in that it comprises, between the analysis and prediction steps of one 
iteration, a picture re-arrangement step. 

2. A method according to claim 1, characterized in that said picture re- 
arrangement step itself comprises in series a first scene change detection sub-step, a 
second allocation sub-step, and a third optimum placement sub-step. 

3. A method according to claim 2, characterized in that said scene change 
detection sub-step comprises a correlation operation, carried out between suceeding 
pictures of the sequence, and a decision operation, for indicating the possible occurrence 
of a scene change. 

4. A method according to claim 3, characterized in that said correlation 
operation is based on a picture complexity estimation. 

5. A method according to claim 3, characterized in that said correlation 
operation is based on the comparison of the blocks of a picture with reference blocks in a 
previous reference picture. 

6. A method according to claim 2, characterized in that said allocation sub-step 
is based on the implementation of a Viterbi algorithm allowing to select a preferred size 
for successive groups of pictures while matching a scene change with the start of such a 
group of pictures. 

7. A method according to anyone of claims 2 to 6, characterized in that said 
optimum placement sub-step comprises a first long-term optimization operation, for 
finding over several groups of pictures the optimum spacing between the pictures of 
these groups, and a second short-term optimization operation, for finding inside a group 
of pictures the best places of predicted and interpolated pictures. 

8. A method according to anyone of claims 2 to 7, characterized in that an 
additional inter-picture prediction step is provided in case of change of the order of the 
picture types between two successive iterations. 

9. A variable bit rate video coder comprising a first coding branch, a prediction 
branch, and a control circuit provided for carrying out the implementation of the following 
operations : 

a coding operation of a bitstream corresponding to a picture sequence, 
with a constant quantization stepsize ; 
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a prediction operation, for an estimation of the quantization stepsize 
allowing to code said bitstream according to a specified target bitrate ; 
at least one repetition of said operations ; 

a final controlling operation, for adjusting the stepsize with respect to 
said target bitrate ; 

characterized in that said control circuit is provided for implementing, between the first 
coding operation and the first prediction operation, a picture re-arrangement operation. 
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Abstract 

The variable bitrate coding method according to the invention comprises an 
iterative process including a first analysis pass and a second prediction pass and followed 
by a last control step for adjusting said stepsize with respect to said .target bitrate. 
5 According to the invention, a picture re-arrangement step is provided between the 

analysis and prediction steps of one iteration, in order to encode with an improved quality 
the picture sequence. 

Application : MPEG-2 encoders for storage media with limited capacity. 
Fig. : 4 
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